IEICE global.ieice.org Site

Keyword Search Result

[Keyword] high-level synthesis(66hit)

21-40hit(66hit)

A Thermal-Aware High-Level Synthesis Algorithm for RDR Architectures through Binding and Allocation
Kazushi KAWAMURA Masao YANAGISAWA Nozomu TOGAWA

PAPER-VLSI Design Technology and CAD

Vol:
E96-A No:1
Page(s):
312-321
With process technology scaling, a heat problem in ICs is becoming a serious issue. Since high temperature adversely impacts on reliability, design costs, and leakage power, it is necessary to incorporate thermal-aware synthesis into IC design flows. In particular, hot spots are serious concerns where a chip is locally too much heated and reducing the peak temperature inside a chip is very important. On the other hand, increasing the average interconnect delays is also becoming a serious issue. By using RDR architectures (Regular-Distributed-Register architectures), the interconnect delays can be easily estimated and their influence can be much reduced even in high-level synthesis. In this paper, we propose a thermal-aware high-level synthesis algorithm for RDR architectures. The RDR architecture divides the entire chip into islands and each island has uniform area. Our algorithm balances the energy consumption among islands through re-binding to functional units. By allocating some new additional functional units to vacant areas on islands, our algorithm further balances the energy consumption among islands and thus reduces the peak temperature. Experimental results demonstrate that our algorithm reduces the peak temperature by up to 9.1% compared with the conventional approach.
A Formal Approach to Optimal Register Binding with Ordered Clocking for Clock-Skew Tolerant Datapaths
Keisuke INOUE Mineo KANEKO

PAPER-Logic Synthesis, Test and Verification

Vol:
E95-A No:12
Page(s):
2330-2337
The impact of clock-skew on circuit timing increases rapidly as technology scales. As a result, it becomes important to deal with clock-skew at the early stages of circuit designs. This paper presents a novel datapath design that aims at mitigating the impact of clock-skew in high-level synthesis, by integrating margin (evaluated as the maximum number of clock-cycles to absorb clock-skew) and ordered clocking into high-level synthesis tasks. As a first attempt to the proposed datapath design, this paper presents a 0-1 integer linear programming formulation that focuses on register binding to achieve the minimum cost (the minimum number of registers) under given scheduling result. Experimental results show the optimal results can be obtained without increasing the latency, and with a few extra registers compared to traditional high-level synthesis design.
Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor
Takao TOI Takumi OKAMOTO Toru AWASHIMA Kazutoshi WAKABAYASHI Hideharu AMANO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E94-A No:12
Page(s):
2619-2627
Iterative synthesis methods for making aware of wire congestion are proposed for a multi-context dynamically reconfigurable processor (DRP) with a large number of processing elements (PEs) and programmable-wire connections. Although complex data-paths can be synthesized using the programmable-wire, its delay is long especially when wire connections are congested. We propose two iterative synthesis techniques between a high-level synthesizer (HLS) and the place & route tool to shorten the prolonged wire delay. First, we feed back wire delays for each context to a scheduler in the HLS. The experimental results showed that a critical-path delay was shorten by 21% on average for applications with timing closure problems. Second, we skip the routing and estimate wire delays based on the congestion. The synthesis time was shorten to 1/3 causing delay improvement rate degradation at two points on average.
A Hierarchical Criticality-Aware Architectural Synthesis Framework for Multicycle Communication
Chia-I CHEN Juinn-Dar HUANG

PAPER-VLSI Design Technology and CAD

Vol:
E93-A No:7
Page(s):
1300-1308
In deep submicron era, wire delay is no longer negligible and is becoming a dominant factor of the system performance. To cope with the increasing wire delay, several state-of-the-art architectural synthesis flows have been proposed for the distributed register architectures by enabling on-chip multicycle communication. In this article, we present a new performance-driven criticality-aware synthesis framework CriAS targeting regular distributed register architectures. To achieve high system performance, CriAS features a hierarchical binding-then-placement for minimizing the number of performance-critical global data transfers. The key ideas are to take time criticality as the major concern at earlier binding stages before the detailed physical placement information is available, and to preserve the locality of closely related critical components in the later placement phase. The experimental results show that CriAS can achieve an average of 14.26% overall performance improvement with no runtime overhead as compared to the previous art.
Floorplan-Aware High-Level Synthesis for Generalized Distributed-Register Architectures
Akira OHCHI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-High-Level Synthesis and System-Level Design

Vol:
E92-A No:12
Page(s):
3169-3179
As device feature size decreases, interconnection delay becomes the dominating factor of circuit total delay. Distributed-register architectures can reduce the influence of interconnection delay. They may, however, increase circuit area because they require many local registers. Moreover original distributed-register architectures do not consider control signal delay, which may be the bottleneck in a circuit. In this paper, we propose a high-level synthesis method targeting generalized distributed-register architecture in which we introduce shared/local registers and global/local controllers. Our method is based on iterative improvement of scheduling/binding and floorplanning. First, we prepare shared-register groups with global controllers, each of which corresponds to a single functional unit. As iterations proceed, we use local registers and local controllers for functional units on a critical path. Shared-register groups physically located close to each other are merged into a single group. Accordingly, global controllers are merged. Finally, our method obtains a generalized distributed-register architecture where its scheduling/binding as well as floorplanning are simultaneously optimized. Experimental results show that the area is decreased by 4.7% while maintaining the performance of the circuit equal with that using original distributed-register architectures.
Word-Level Equivalence Checking in Bit-Level Accuracy by Synthesizing Designs onto Identical Datapath
Tasuku NISHIHARA Takeshi MATSUMOTO Masahiro FUJITA

PAPER-Hardware Verification

Vol:
E92-D No:5
Page(s):
972-984
Equivalence checking is one of the most important issues in VLSI design to guarantee that bugs do not enter designs during optimization steps or synthesis steps. In this paper, we propose a new word-level equivalence checking method between two models before and after high-level synthesis or behavioral optimization. Our method converts two given designs into RTL models which have same datapaths so that behaviors by identical control signals become the same in the two designs. Also, functional units become common to the two designs. Then word-level equivalence checking techniques can be applied in bit-level accuracy. In addition, we propose a rule-based equivalence checking method which can verify designs which have complicated control structures faster than existing symbolic simulation based methods. Experimental results with realistic examples show that our method can verify such designs in practical periods.
High-Level Synthesis of Software Function Calls
Masanari NISHIMURA Nagisa ISHIURA Yoshiyuki ISHIMORI Hiroyuki KANBARA Hiroyuki TOMIYAMA

LETTER-High-Level Synthesis and System-Level Design

Vol:
E91-A No:12
Page(s):
3556-3558
This letter presents a novel framework in high-level synthesis where hardware modules synthesized from functions in a given ANSI-C program can call the other software functions in the program. This enables high-level synthesis from C programs that contains calls to hard-to-synthesize functions, such as dynamic memory management, I/O request, or very large and complex functions. A single-thread implementation scheme is shown, whose correctness has been verified through register transfer level simulation.
Evaluation of Interconnect-Complexity-Aware Low-Power VLSI Design Using Multiple Supply and Threshold Voltages
Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E91-A No:12
Page(s):
3596-3606
This paper presents a high-level synthesis approach to minimize the total power consumption in behavioral synthesis under time and area constraints. The proposed method has two stages, functional unit (FU) energy optimization and interconnect energy optimization. In the first stage, active and inactive energies of the FUs are optimized using a multiple supply and threshold voltage scheme. Genetic algorithm (GA) based simultaneous assignment of supply and threshold voltages and module selection is proposed. The proposed GA based searching method can be used in large size problems to find a near-optimal solution in a reasonable time. In the second stage, interconnects are simplified by increasing their sharing. This is done by exploiting similar data transfer patterns among FUs. The proposed method is evaluated for several benchmarks under 90 nm CMOS technology. The experimental results show that more than 40% of energy savings can be achieved by our proposed method.
Memory Allocation for Multi-Resolution Image Processing
Yasuhiro KOBAYASHI Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-VLSI Systems

Vol:
E91-D No:10
Page(s):
2386-2397
Hierarchical approaches using multi-resolution images are well-known techniques to reduce the computational amount without degrading quality. One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. The complexity of the interconnection network mainly depends on memory allocation; it maps pixels onto memory modules and determines the required number of memory modules. This paper presents a memory allocation method to minimize the number of memory modules for image processing using multi-resolution images. For efficient search, the proposed method exploits the regularity of window-type image processing. A practical example demonstrates that the number of memory modules is reduced to less than 14% that of conventional methods.
An ILP Approach to the Simultaneous Application of Operation Scheduling and Power Management
Shih-Hsu HUANG Chun-Hua CHENG

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:1
Page(s):
375-382
At the behavioral level, large power savings are possible by shutting down unused operations, which is commonly referred to as power management. However, operation scheduling has a significant impact on the potential for power saving via power management. In this paper, we present an integer linear programming (ILP) model to formally formulate the simultaneous application of operation scheduling and power management in high level synthesis. Our objective is to maximize the power saving under both the timing constraints and the resource constraints. Note that our approach guarantees solving the problem optimally. Compared with previous work, experimental data consistently show that our approach has significant relative improvement in the power savings.
Temporal Partitioning to Amortize Reconfiguration Overhead for Dynamically Reconfigurable Architectures
Jinhwan KIM Jeonghun CHO Tag Gon KIM

PAPER-Reconfigurable Device and Design Tools

Vol:
E90-D No:12
Page(s):
1977-1985
In these days, many dynamically reconfigurable architectures have been introduced to fill the gap between ASICs and software-programmed processors such as GPPs and DSPs. These reconfigurable architectures have shown to achieve higher performance compared to software-programmed processors. However, reconfigurable architectures suffer from a significant reconfiguration overhead and a speedup limitation. By reducing the reconfiguration overhead, the overall performance of reconfigurable architectures can be improved. Therefore, we will describe temporal partitioning, which are able to amortize the reconfiguration overhead at synthesis phase or compilation time. Our temporal partitioning methodology splits a configuration context into temporal partitions to amortize reconfiguration overhead. And then, we will present benchmark results to demonstrate the effectiveness of our methodology.
Max-Flow Scheduling in High-Level Synthesis
Liangwei GE Song CHEN Kazutoshi WAKABAYASHI Takashi TAKENAKA Takeshi YOSHIMURA

PAPER-VLSI Design Technology and CAD

Vol:
E90-A No:9
Page(s):
1940-1948
Scheduling, an essential step in high-level synthesis, is an intractable process. Traditional heuristic scheduling methods usually search schedules directly in the entire solution space. In this paper, we propose the idea of searching within an intermediate solution space (ISS). We put forward a max-flow scheduling method that heuristically prunes the solution space into a specific ISS and finds the optimum of ISS in polynomial time. The proposed scheduling algorithm has some unique features, such as the correction of previous scheduling decisions in a later stage, the simultaneous scheduling of all the operations, and the optimization of more complicated objectives. Aided by the max-flow scheduling method, we implement the optimization of the IC power-ground integrity problem at the behavior level conveniently. Experiments on well-known benchmarks show that without requiring additional resources or prolonging schedule latency, the proposed scheduling method can find a schedule that draws current more stably from a supply, which mitigates the voltage fluctuation in the on-chip power distribution network.
A Simultaneous Module Selection, Scheduling, and Allocation Method Considering Operation Chaining with Multi-Functional Units
Tsuyoshi SADAKATA Yusuke MATSUNAGA

PAPER

Vol:
E90-A No:4
Page(s):
792-799
A Multi-Functional unit has several functions and these can be changed with a control signal. For High-Level Synthesis, using Multi-Functions units in operation chaining make it possible to obtaining the solution with the same number of control steps and less resources compared to that without them. This paper proposes an operation chaining method considering Multi-Functional units. The method formulates module selection, scheduling, and functional unit allocation with operation chaining as a 0/1 integer linear problem and obtains optimal solution with minimum number of control steps under area and clock-cycle type constraints. The first contribution of this paper is to propose the global search for operation chaining with Multi-Functional units having multiple outputs as well as with single output. The second contribution is to condier the area constraint as a resource constraint instead of the type and number of functional units. Experimental results show that chaining with Multi-Functional units is effective and the proposed method is useful to evaluate heuristic algorithms.
Bit-Length Optimization Method for High-Level Synthesis Based on Non-linear Programming Technique
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA

PAPER-System Level Design

Vol:
E89-A No:12
Page(s):
3427-3434
High-level synthesis is a novel method to generate a RT-level hardware description automatically from a high-level language such as C, and is used at recent digital circuit design. Floating-point to fixed-point conversion with bit-length optimization is one of the key issues for the area and speed optimization in high-level synthesis. However, the conversion task is a rather tedious work for designers. This paper introduces automatic bit-length optimization method on floating-point to fixed-point conversion for high-level synthesis. The method estimates computational errors statistically, and formalizes an optimization problem as a non-linear problem. The application of NLP technique improves the balancing between computational accuracy and total hardware cost. Various constraints such as unit sharing, maximum bit-length of function units can be modeled easily, too. Experimental result shows that our method is fast compared with typical one, and reduces the hardware area.
An ILP Approach to the Slack Driven Scheduling Problem
Shih-Hsu HUANG Chun-Hua CHENG

LETTER-VLSI Design Technology and CAD

Vol:
E89-A No:6
Page(s):
1852-1858
With the advent of deep sub-micron era, there is a demand to consider the design closure problem in high-level synthesis. It is well known that the slack is an effective means of tolerating the uncertainties in operation delays. Previous work ever attempted to increase the usable slack based on a given initial schedule. Instead of the post-processing approach, this paper is the first attempt to the simultaneous application of operation scheduling and slack optimization. We use a 0-1 integer linear programming (0-1 ILP) approach to formally formulate the problem. Under the design constraints (timing and resource), our approach is applicable to two different objective functions: the maximization of the total usable slack and the maximization of the number of non-zero slack operations. Compared with previous work, our approach has the following two advantages: first, our approach guarantees the optimality; second, our approach is more suitable for the design space exploration.
High-Level Power Optimization Based on Thread Partitioning
Jumpei UCHIDA Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-System Level Design

Vol:
E87-A No:12
Page(s):
3075-3082
This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks (threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have RF. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.
On Multiple-Voltage High-Level Synthesis Using Algorithmic Transformations
Lan-Rong DUNG Hsueh-Chih YANG

PAPER-Logic Synthesis

Vol:
E87-A No:12
Page(s):
3100-3108
This paper presents a multiple-voltage high-level synthesis approach for low power DSP applications using algorithmic transformation techniques. Our approach is motivated by maximization of task mobilities in that the increase of mobilities may raise the possibility of assigning tasks to low-voltage components. The mobility means the ability to schedule the starting time of a task. It is defined as the distance between its as-late-as-possible (ALAP) schedule time and its as-soon-as-possible (ASAP) schedule time. To earn task mobilities, we use loop shrinking, retiming and unfolding techniques. The loop shrinking can first reduce the iteration period bound (IPB) and, then, the others are employed for shortening the iteration period (IP) as much as possible. The minimization of IP results in high task mobilities. Finally, we can assign tasks with high mobilities to low-voltage components and, thus, minimize energy under resource and latency constraints. With considering the overhead of level conversion, our approach can achieve significant power reduction. In the case of the third-order IIR filter, the proposed approach can save up to 40.2% of power consumption.
Bit Length Optimization of Fractional Part on Floating to Fixed Point Conversion for High-Level Synthesis
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA Katsumasa WATANABE

PAPER-Logic and High Level Synthesis

Vol:
E86-A No:12
Page(s):
3184-3191
In the hardware synthesis from a high-level language such as C, the bit length of variables is one of the key issues for the area and speed optimization. Usually, designers are required to optimize the bit-length of each variable manually using the time-consuming simulation on huge-data. In this paper, we propose an optimization method of the fractional bit length in the conversion from floating-point variables to fixed-point variables. The method is based on error propagation and the backward propagation of the accuracy limitation. The method is fully analytical and fast compared to simulation based methods.
High-Level Synthesis by Ants on a Tree
Rachaporn KEINPRASIT Prabhas CHONGSTITVATANA

PAPER-VLSI Design Technology and CAD

Vol:
E86-A No:10
Page(s):
2659-2669
In this paper an algorithm based on Ant Colony Optimization techniques called Ants on a Tree (AOT) is introduced. This algorithm can integrate many algorithms together to solve a single problem. The strength of AOT is demonstrated by solving a High-Level Synthesis problem. A High-Level Synthesis problem consists of many design steps and many algorithms to solve each of them. AOT can easily integrate these algorithms to limit the search space and use them as heuristic weights to guide the search. During the search, AOT generates a dynamic decision tree. A boosting technique similar to branch and bound algorithms is applied to guide the search in the decision tree. The storage explosion problem is eliminated by the evaporation of pheromone trail generated by ants, the inherent property of our search algorithm.
A High-Level Energy-Optimizing Algorithm for System VLSIs Based on Area/Time/Power Estimation
Shinichi NODA Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI

PAPER-High Level Synthesis

Vol:
E85-A No:12
Page(s):
2655-2666
This paper proposes a high-level energy-optimizing algorithm which can synthesize low energy system VLSIs. Given an initial system hardware obtained from an abstract behavioral description, the proposed algorithm applies to it the three energy reduction techniques, 1) reducing supply voltage, 2) selecting lower energy modules, and 3) applying gated clocks. By incorporating our area/delay/power estimation, the proposed algorithm can obtain low energy system VLSIs meeting the constraints of area, delay, and execution time. The proposed algorithm has been incorporated into a high-level synthesis system and experimental results demonstrate effectiveness and efficiency of the algorithm.

21-40hit(66hit)

Keyword Search Result

[Keyword] high-level synthesis(66hit)

A Thermal-Aware High-Level Synthesis Algorithm for RDR Architectures through Binding and Allocation

A Formal Approach to Optimal Register Binding with Ordered Clocking for Clock-Skew Tolerant Datapaths

Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor

A Hierarchical Criticality-Aware Architectural Synthesis Framework for Multicycle Communication

Floorplan-Aware High-Level Synthesis for Generalized Distributed-Register Architectures

Word-Level Equivalence Checking in Bit-Level Accuracy by Synthesizing Designs onto Identical Datapath

High-Level Synthesis of Software Function Calls

Evaluation of Interconnect-Complexity-Aware Low-Power VLSI Design Using Multiple Supply and Threshold Voltages

Memory Allocation for Multi-Resolution Image Processing

An ILP Approach to the Simultaneous Application of Operation Scheduling and Power Management

Temporal Partitioning to Amortize Reconfiguration Overhead for Dynamically Reconfigurable Architectures

Max-Flow Scheduling in High-Level Synthesis

A Simultaneous Module Selection, Scheduling, and Allocation Method Considering Operation Chaining with Multi-Functional Units

Bit-Length Optimization Method for High-Level Synthesis Based on Non-linear Programming Technique

An ILP Approach to the Slack Driven Scheduling Problem

High-Level Power Optimization Based on Thread Partitioning

On Multiple-Voltage High-Level Synthesis Using Algorithmic Transformations

Bit Length Optimization of Fractional Part on Floating to Fixed Point Conversion for High-Level Synthesis

High-Level Synthesis by Ants on a Tree

A High-Level Energy-Optimizing Algorithm for System VLSIs Based on Area/Time/Power Estimation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles